Practical and Optimal LSH for Angular Distance
نویسندگان
چکیده
We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [1, 2]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [3] in practice. We also introduce a multiprobe version of this algorithm, and conduct experimental evaluation on real and synthetic data sets. We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.
منابع مشابه
Generic LSH Families for the Angular Distance Based on Johnson-Lindenstrauss Projections and Feature Hashing LSH
In this paper we propose the creation of generic LSH families for the angular distance based on Johnson-Lindenstrauss projections. We show that feature hashing is a valid J-L projection and propose two new LSH families based on feature hashing. These new LSH families are tested on both synthetic and real datasets with very good results and a considerable performance improvement over other LSH f...
متن کاملSieving for Shortest Vectors in Lattices Using Angular Locality-Sensitive Hashing
By replacing the brute-force list search in sieving algorithms with Charikar’s angular localitysensitive hashing (LSH) method, we get both theoretical and practical speedups for solving the shortest vector problem (SVP) on lattices. Combining angular LSH with a variant of Nguyen and Vidick’s heuristic sieve algorithm, we obtain heuristic time and space complexities for solving SVP in dimension ...
متن کاملS2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions
To compare the similarity of probability distributions, the information-theoretically motivated metrics like KullbackLeibler divergence (KL) and Jensen-Shannon divergence (JSD) are often more reasonable compared with metrics for vectors like Euclidean and angular distance. However, existing locality-sensitive hashing (LSH) algorithms cannot support the information-theoretically motivated metric...
متن کاملLSH Forest: Practical Algorithms Made Theoretical
We analyze LSH Forest [BCG05]—a popular heuristic for the nearest neighbor search—and show that a careful yet simple modification of it outperforms “vanilla” LSH algorithms. The end result is the first instance of a simple, practical algorithm that provably leverages data-dependent hashing to improve upon data-oblivious LSH. Here is the entire algorithm for the d-dimensional Hamming space. The ...
متن کاملPractical linear-space Approximate Near Neighbors in high dimension
The c-approximate Near Neighbor problem in high dimensional spaces has been mainly addressed by Locality Sensitive Hashing (LSH), which offers polynomial dependence on the dimension, query time sublinear in the size of the dataset, and subquadratic space requirement. For practical applications, linear space is typically imperative. Most previous work in the linear space regime focuses on the ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015